RBCN: Rectified Binary Convolutional Networks with Generative Adversarial Learning 65

whereis an operator that obtains the pruned weight with mask Mp. The other part of

the forward propagation in the pruned RBCNs is the same as in the RBCNs.

In pruned RBCNs, what needs to be learned and updated are full precision filters Wp,

learnable matrices Cp, and soft mask Mp. In each convolutional layer, these three sets of

parameters are jointly learned.

Update Mp. Mp is updated by FISTA [141] with the initialization of α(1) = 1. Then

we obtain the following.

α(k+1) = 1

2(1 +



1 + 4α2

(k)),

(3.84)

y(k+1) = Mp,(k) + a(k)1

a(k+1)

(Mp,(k)Mp,(k1)),

(3.85)

Mp,(k+1) = proxη(k+1)λ||·||1 (y(k+1)ηk+1

(LAdv p + LData p)

(y(k+1))

),

(3.86)

where ηk+1 is the learning rate in iteration k + 1 and proxη(k+1)λ||·||1 (zi) = sign(zi) · (|zi| −

η0λ)+, more details can be found in [142].

Update Wp. Let δW l

p,i be the gradient of the full precision filter W l

p,i. During backprop-

agation, the gradients pass to ˆW l

p,i first and then to W l

p,i. Furthermore,

δW l

p,i =Lp

ˆW l

p,i

= LS p

ˆW l

p,i

+ LAdv p

ˆW l

p,i

+ LKernel p

ˆW l

p,i

+ LData p

ˆW l

p,i

,

(3.87)

and

W l

p,i W l

p,i ηp,1δW l

p,i,

(3.88)

where ηp,1 is the learning rate, LKernel p

ˆ

W l

p,i

and LAdv p

ˆ

W l

p,i

are

LKernel p

ˆW l

p,i

=λ1(W l

p,i Cl

p ˆW l

p,i)Cl

p,

(3.89)

LAdv p

ˆW l

p,i

=2(1D(T l

p,i; Yp)) ∂Dp

ˆW l

p,i

.

(3.90)

And

LData p

ˆW l

p,i

=1

n(RpTp) ∂Tp

ˆW l

p,i

,

(3.91)

Update Cp. We further update the learnable matrix Cl

p with W l

p and M l

p fixed. Let δClp

be the gradient of Cl

p. Then we have

δCl

p =Lp

ˆClp

= LS p

ˆClp

+ LAdv p

ˆClp

+ LKernel p

ˆClp

+ LData p

ˆClp

,

(3.92)

and

Cl

p Cl

p ηp,2δCl

p.

(3.93)

and LKernel p

∂Cl

p

and LAdv p

∂Cl

p

are

LKernel p

∂Clp

=λ1



i

(W l

p,i Cl

p ˆW l

p,i) ˆW l

p,i,

(3.94)